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T he identification of a growing number of novel M endelian disorders and private mutations in the Roma (Gypsies) 
points to their unique genetic heritage. Linguistic evidence suggests that they are of diverse Indian origins. T heir 
social structure within Europe resembles that of the jatis of India, where the endogamous group, often defined by 
profession, is the primary unit. Genetic studies have reported dramatic differences in the frequencies of mutations 
and neutral polymorphisms in different Romani populations. H owever, these studies have not resolved ambiguities 
regarding the origins and relatedness of Romani populations. In this study, we examine the genetic structure of 14 
well-defined Romani populations. Y-chromosome and mtDNA markers of different mutability were analyzed in a 
total of 275 individuals. Asian Y-chromosome haplogroup VI-68, defined by a mutation at the M 82 locus, was 
present in all 14 populations and accounted for 44.8% of Romani Y chromosomes. Asian mtDN A-haplogroup M 
was also identified in all Romani populations and accounted for 26.5% of female lineages in the sample. Limited 
diversity within these two haplogroups, measured by the variation at eight short-tandem-repeat loci for the Y 
chromosome, and sequencing of the HVS1 for the mtDNA are consistent with a small group of founders splitting 
from a single ethnic population in the Indian subcontinent. Principal-components analysis and analysis of molecular 
variance indicate that genetic structure in extant endogamous Romani populations has been shaped by genetic drift 
and differential admixture and correlates with the migrational history of the Roma in Europe. By contrast, social 
organization and professional group divisions appear to be the product of a more recent restitution of the caste 
system of India. 


Introduction 


The Roma (Gypsies) became one of the peoples of Eu- 
rope when they arrived in the Byzantine Empire 
900-1,100 years ago (Fraser 1992; Rochow and M at- 
schke 1991). The formation of the present-day Romani 
populations of European countries is the compound 
product of the early migrations from the Balkans into 
western Europe, completed by the 15th century, and 
three superimposed migration waves: the first during the 
end of the 19th century, after the abolition of Gypsy 
slavery in Romania (H ancock 1987; Fraser 1992; Lié 
geois 1994); the second out of Yugoslavia, during the 
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1960s and 1970s; and the third during the last decade, 
following the political and economic changes in eastern 
Europe (Reyniers 1995). Current estimates of the total 
Romani population size in Europe range from 4 million 
to 10 million, with the largest numbers concentrated in 
central and southeastern Europe (Liégeois 1994; M a- 
rushiakova and Popov 20019). 

In recent years, novel single-gene disorders (see Ka- 
laydjieva et al. 1996, 2000; Angelicheva et al. 1999; 
Tournev et al. 1999; Rogers et al. 2000; Thomas et al., 
2001), as well as private mutations causing known 
M endelian disorders (see Piccolo et al. 1996; Abicht et 
al. 1999; Kalaydjieva et al. 1999; Plasilova et al. 1999), 
have been identified. Large Romani families with psy- 
chiatric disorders are being used in an effort to localize 
susceptibility genes (Kaneva et al. 1998), and epide- 
miological evidence suggests that there are differences 
in the prevalence of other complex disorders, such as 
Parkinson disease and multiple sclerosis, between the 
Roma and surrounding European populations (K alman 
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et al. 1991; M ilanov et al. 2000). The Roma are thus 
emerging as an interesting founder population, with a 
genetic-research potential that is still to be explored. 

The complex structure of Romani society, where the 
Romani Group is the primary unit, has long attracted 
the attention of cultural anthropologists (Petulengro 
1915-16; Fraser 1992; M arushiakova and Popov 1997). 
Liégeois (1994, p. 61) describes the current social or- 
ganization of the Roma as a “fluid mosaic of diversified 
groups.” Group identity and the ensuing social divisions 
are based on a variety of criteria, such as customs, eth- 
nonyms describing traditional trades, and dialects reflect- 
ing the history of migrations. The greatest diversity is 
found in the Balkans, where numerous Romani popu- 
lations with well-defined social boundaries exist (M a- 
rushiakova and Popov 1997, 2001a). This social organ- 
ization and its strong impact on rules of endogamy have 
not been addressed in genetic research. Population-ge 
netic studies of the Roma from different European coun- 
tries have been performed for nearly 80 years and have 
mostly sought to compare the Roma to autochthonous 
Europeans and to identify genetic affinities with proposed 
parental populations and with other Romani popula- 
tions. The low resolution of individual classical genetic 
markers and the random sampling design have often led 
to contradictory results. N onetheless, these studies have 
generally concluded that theR oma are genetically distinct 
from other European populations, while, at the same 
time, different Romani populations are separated by 
larger genetic distances than are their European neigh- 
bors (reviewed by Kalaydjieva et al. [2001b]). Recent 
medical-genetic studies have shown that founder muta- 
tions can be shared by socially diverse and geographically 
dispersed Romani populations, whereas those living in 
close geographic proximity can display markedly differ- 
ent gene frequencies (reviewed by Kalaydjieva et al. 
[2001b]). Thus, social practices, as well as genetic data, 
suggest significant population substructure. The relation- 
ship between traditional group divisions and biological 
affinities, however, is unclear and appears to be complex. 
Current patterns— genetic as well as social— could be the 
product of diverse scenarios, with different implications 
for genetic epidemiology. 

In this study, we address the issue of genetic relat- 
edness behind the social and cultural diversity of Ro- 
mani populations. We have used Y-chromosome and 
mtDNA markers of different mutability to examine the 
origins and diversification of paternal and maternal line- 
ages in 14 well-defined Romani populations. The find- 
ings point to common Asian origins and suggest that 
the early history of splits and migrations in Europe has 
played a major rolein shaping current genetic structure. 
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Subjects and Methods 


Study Populations 


This study included 275 unrelated males from 14 tra- 
ditional Romani populations, selected to represent dif- 
ferent cultural-anthropological classification criteria 
(M arushiakova and Popov 1997) and to allow an as- 
sessment of their genetic relevance. Group characteristics 
and numbers sampled are shown in table 1. M ost pop- 
ulations are well defined and endogamous relative to 
each other, except for the Lingurari, M onteni, and In- 
treni, who are separated by geographic distance rather 
than by rules of endogamy. The previously described 
Kalderash, M onteni, and Lom populations (K alaydjieva 
et al. 2001a) were typed for additional loci, and theLom 
sample size was expanded. 

The analyses also included samples from 40 males 
from Asia and the Μ iddle East who were found to carry 
Y-chromosome haplogroups V 1-68 and V 1-56, as defined 
by mutations M 82 and M 67, respectively (Underhill et 
al. 2000). These samples were genotyped for the Y-chro- 
mosome short-tandem-repeat (Y STR) markers used in 
this study. 

This study is part of an ongoing project, investigating 
the molecular epidemiology οἵ single-gene disorders and 
the population structure of the Roma, conducted in col- 
laboration with Romani organizations and local health 
authorities. Research into genetic epidemiology (to be 
published separately) involves carrier testing for private 
founder mutations, with genetic counseling provided to 
all participating subjects. Informed consent for both as- 
pects of the study has been obtained from all individuals 
involved. This study complies with the ethical guidelines 
of the participating institutions. 


Y-Chromosome Analysis 


This part of the study included 252 Romani and 40 
non-Romani male subjects. As suggested by de Knijff 
(2000), we designate Y chromosomes defined by unique- 
event polymorphisms (UEPs) as “haplogroups,” those 
defined by Y 5ΤΒ5 as “haplotypes,” and those defined 
by both UEPs and Y STRs as “lineages.” Η aplogroup 
designation follows the nomenclature proposed by Un- 
derhill et al. (2000). 


Y-Chromosome H aplogroups 


Comprehensive analysis of UEPs was performed as 
described (Underhill et al. 1997, 2000, 2001; Shen et al. 
2000) on 94 Romani males, aiming at the identification 
of the major Y-chromosome haplogroups in the Roma. 
The remaining 158 samples were typed for the M 82 
locus, a 2-bp deletion, in derived Y chromosomes, that 
defines haplogroup ΝΙ-68 (Underhill et al. 2000). PCR 


Table 1 


Description of the Romani Populations Included in the Study 


Sample 
Population? Place of Residence Traditional Trade Language/Dialect History of Migrations Rdigion Size 
Turgovzi (Tu) Bulgaria, Omurtag M erchants Romanes, Balkan dialect; Turkish Early settlement in Bulgaria Islam 36 
Feredjdli (Fe) Bulgaria, Omurtag Unskilled laborers Turkish Early settlement in Bulgaria Islam 21 
Kalaidjii North (KN) Bulgaria, Lom Tinsmiths Romanes, Balkan dialect; Early settlement in Bulgaria Protestant 20 
Koshnichari South Central (KC) Bulgaria, Plovdiv region Basket makers Romanes, Balkan dialect Early settlement in Bulgaria Eastern Orthodox 4 
Koshnichari Southwest (KW) Bulgaria, Gotze Dachev Basket makers Romanes, Balkan dialect Early settlement in Bulgaria Protestant 5 
Kalaidjii South (KS) Bulgaria, Gotze Dachev Tinsmiths Romanes, Old Vlax dialect? Wallachia/M oldavia, to Bulgaria in 17th and 18th centuries Eastern Orthodox 10 
Lom (Lo) Bulgaria, Lom Livestock dealers Romanes, Old Vlax dialect? Wallachia/M oldavia, to Bulgaria in 17th and 18th centuries Protestant 43 
M onteni (Μο) Bulgaria, Balkan M ountain villages Bowl makers Archaic Rumanian Wallachia/M oldavia, to Bulgaria in late 19th century Eastern Orthodox 42 
Intreni (In) Bulgaria, Letnitza Bowl makers Archaic Rumanian Wallachia/M oldavia, to Bulgaria in late 19th century Eastern Orthodox 17 
Lingurari North (LN) Bulgaria, northern part Bowl makers Archaic Rumanian Wallachia/M oldavia, to Bulgaria in late 19th century Eastern Orthodox 18 
Lingurari South (LS) Bulgaria, southern part Bowl makers Archaic Rumanian Wallachia/M oldavia, to Bulgaria in late 19th century Eastern Orthodox 9 
Kalderash (Ka) Bulgaria, northern part Coppersmiths Romanes, New Vlax dialect? Wallachia/M oldavia, to Bulgaria in late 19th century Eastern Orthodox 23 
Spanish Roma (SR) M adrid M erchants Spanish Early migration to north/Western Europe Protestant 27 
Lithuanian Roma (LR) Vilnius, Lithuania M erchants Romanes Early migration to north/Western Europe Roman Catholic 20 


a Two-letter abbreviations of population names are used in tables throughout this article 
5 Vlax dialects are characterized by a strong linguistic influence from Romanian. 
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amplification was done with fluorescently labeled prim- 
ers 5'-CTGTACTCCTGGGTAGCCTGT-3’ and 5-ΑΑ- 
GAACGATTGAACACACTAACTC-3’. The products 
were separated by size on a 377 DNA Analyzer (Applied 
Biosystems). 

The 70 samples that carried the ancestral M 82 allele 
were genotyped for specific UEPs on the basis of the 
identities of their Y STR haplotypes with the common 
haplotype(s) of the specific haplogroup in the fully char- 
acterized Romani samples. These markers included M 1, 
M 45, M67, M89, and M170. M 1 was analyzed as de 
scribed elsewhere (Hammer and Horai 1995). The re 
maining UEPs were analyzed using a modified version 
of the primer-extension assay (Bray et al. 2001) (protocol 
available on request) and matrix-assisted laser desorp- 
tion/ionization time-of-flight mass spectrometry. M ass 
spectra were collected using a Voyager-DE PRO 
MALDI-TOF instrument (Applied Biosystems). G eno- 
types were determined manually by calculation of the 
mass of the dideoxynucleotide added onto the primer. 
The above analytical system left five samples for which 
haplogroup assignment was not possible. 


Y STR Haplotypes 


A total of 209 Romani and 40 non-R omani individuals 
were genotyped for eight Y STR loci—namely, DY S19, 
DY 5388, ΡΥ5389||, DY $3891, DY 5390, DY $391, 
DY 5392, and DY S393. In addition, Y STR data for 43 
Roma from three populations described by Kalaydjieva 
et al. (2001a) were expanded by typing for DY S388. PCR 
primers were as described elsewhere (K ayser et al. 1997). 
The products were separated on an ABI 373A DNA An- 
alyzer (Applied Biosystems). Allele sizes were converted 
to repeat number by ue of allelic ladders, which were 
analyzed in parallel. We define DY S389CD as equivalent 
to DY 53891, and we define and DY S389AB as equivalent 
to DY 5389|! minus DY 5389 (Rolf et al. 1998). H aplo- 
types were constructed following the ascending numerical 
order of loci given above. 


mtDNA 


mtDNA was analyzed in 275 Romani subjects. By anal- 
ogy to the Y chromosome, mtDNA “haplogroups” are 
defined by coding-region RFLPs, “haplotypes” are de 
fined by hypervariable segment 1 (H VS1) sequences, and 
mtD N As defined by both RFLPs and H VS1 sequences are 
referred to as “lineages.” 


mtDNA Haplogroups 


RFLP analysis of coding regions of the mitochondrial 
genome was performed on 165 samples by use of stan- 
dard protocols (Passarino et al. 1996; Richards et al. 
1998; M acaulay et al. 1999). This analysis provided an 
indication of the mtDNA haplogroups present in the 
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Roma. In 110 samples, in which RFLP analysis was not 
performed, haplogroups were inferred from character- 
istic H VS1 variants (M acaulay et al. 1999; Simoni et al. 
2000). 


mtDNA Haplotypes 


ΗΝ 51 sequencing was performed on 194 samples. In 
addition, 81 H VS1 sequences previously reported in the 
Roma (Kalaydjieva et al. 20014) were included in the 
statistical analyses. PCR amplification of the D-loop seg- 
ment between positions 15997 and 16400 (Anderson et 
al. 1981) was performed as described elsewhere (Calafell 
et al. 1996). The samples were sequenced in both di- 
rections and were run on an ABI 373A DNA Analyzer 
(Applied Biosystems). A 360-bp fragment of ΗΝ 51, be- 
tween positions 16023 and 16384, was analyzed. 


Data Analysis 


The frequencies of male and female haplotypes, hap- 
logroups, and lineages and the number of shared lineages 
were determined by direct counting. Diversity indices 
were determined using ARLEQUIN. Haplotype diver- 
sity, h, and its variance, V(h), were calculated according 
to the method of Nei (1987). Pairwise differences, k, 
between haplotypes were calculated to provide a mea- 
sure of the relatedness of haplotypes within haplo- 
groups. Phylogenetic relationships between haplotypes 
within haplogroups were examined by constructing me- 
dian-joining networks by use of Network 3.0 (see the 
Life Sciences and Engineering Technology Solutions web 
site) (Bandelt et al. 1995). 

The age of the founding Y-chromosome haplogroup 
V 1-68 lineage was calculated as described by Kittles et al. 
(1998), with a Y STR mutation rate of 2.1 x 107-3 (95% 
confidence interval [950661] 0.6 x 10:3-4.9 x 1073) 
(H eyer et al. 1997). The age of the mtDNA haplogroup 
M lineage in the Roma was determined as suggested by 
Saillard et al. (2000). Given that most of the actual mu- 
tated sites appear to have high mutation rates, the average 
mutation rate used in the calculations was roughly three 
times that used by M eyer et al. (1999)— that is, one mu- 
tation per 6,727 years. The average number of mutations 
from the ancestral haplotype were computed with N et- 
work 3.0 (see the Life Sciences and Engineering Tech- 
nology Solutions web site) (Bandelt et al. 1995). A gen- 
eration time of 25 years was used. 

Principal-components (PC) analysis was used to ex- 
amine the differences in the distribution of Y chromo- 
some and mtDNA haplogroups among 11 Romani pop- 
ulations where sample sizes were =10 for both data sets. 
Theanalysis was performed using the computer program 
ANTANA based on Eigenanalysis, where a correlation 
matrix is generated from standardized frequency data, 
corrected for sample size. 


Table 2 


Y-Chromosome Lineages Identified in 14 Romani Populations 
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ον No. oF Y CHROMOSOMES IN POPULATION 
LINEAGE HAPLOTYPE? LN LS In Mo Lo KS Ka KN KW KC Tu Fe LR SR Total 
VI-57" 
A 16-15-19-13-22-10-11-12 1 1 3 
B 16-15-18-13-22-10-11-12 1 1 
ς 16-15-20-14-22-10-11-12 1 1 
ν-52" 
Α 15-13-16-13-24-10-11-13 2 2 
B 15-13-16-13-25-10-11-13 1 1 
C 15-13-16-13-24-11-11-12 1 1 
IX -108% 
A 14-12-17-13-24-11-11-13 1 1 
Unknown! 
A 17-12-16-13-24-10-11-13 1 1 
B 16-15-19-13-22-11-11-12 1 1 
C 16-13-16-13-23-10-11-13 1 1 
D 15-12-18-13-25-11-11-13 1 1 
E 13-12-16-14-23-?-13-14 1 1 
Total 16 6 7 7 io il 20 8 4 36 21 20 27 252 


a Constructed using the marker order DY S19-DY S388-DY S389AB-DY 53896 D -D Y 5390-ΡΥ $391-D Y $392-DY $393. 
5 Defined by UEP delAT at locus M 82 and accounts for 44.8% of the total population in this study. 
- Defined by UEP A->C at locus M170 and accounts for 22.6% of the total population in this study. 
4 Defined by UEP A->T at locus M 67 and accounts for 12.7% of the total population in this study. 
° Defined by UEP A->C at locus M173 and accounts for 6.7% of the total population in this study. 
f Defined by UEP T>G at locus M 35 and accounts for 3.6% of the total population in this study. 

9 Defined by UEP C~T at locus M 89 and accounts for 3.6% of the total population in this study. 

" Defined by UEP T->C at locus M 92 and accounts for 2.0% of the total population in this study. 

i Defined by UEP AC at locus M 217 and accounts for 1.6% of the total population in this study. 

i The M 217 locus was first reported, by Underhill et al. (2001), as defining haplogroup V-52 . 

© Defined by UEP 480 at locus M17 and accounts for 0.4% of the total population in this study. 


' Unknown haplogroups account for 2.0% of the total population in this study. 


Analysis of molecular variance (AM OVA; Excoffier et 
al. 1992) was performed on theY STR and mtDNA HVS1 
data. Different groupings of populations, based on the 
criteria outlined in table 1, were considered. The appor- 
tionment of genetic variance was assessed, between in- 
dividuals within populations, between populations within 
groups, and between groups of populations. The analyses 
were done with ARLEQUIN, using the “sum of squared 
size difference” setting, for Y STR data, and “pairwise 
differences,” for mtDNA ΗΝ 51 data. Standard Bonfer- 
roni corrections were used to account for multiple 
comparisons. 


Results 


Y-Chromosome Analysis 


The data obtained from the analysis of 252 male 
Roma are summarized in table 2. A total of nine known 
haplogroups were identified among the 247 Romani Y 
chromosomes for which haplogroup assignment was 
possible. Three haplogroups— namely, V 1-68, V 1-52, and 
Vl-56— occurred at high frequencies (>10%) and to- 
gether accounted for ~80% of all Y chromosomes. Four 
haplotypes— ΝΙ-68Α, VI-68B, VI-52A, and VI-56A— 
together accounted for 57% of all Y chromosomes. 


Major Paternal Founding Lineage 


V1-68 was by far the most common haplogroup. It 
was observed in all 14 Romani populations and com- 
prised 113 chromosomes, or 44.8% of the overall study 
population. Haplogroup V1I-68 has been found previ- 
ously at low frequencies in the Indian subcontinent and 
central Asia but, so far, has not been observed in other 
European populations (Underhill et al. 2000), with the 
exception of one individual in the Ukraine (Semino et 
al. 2000). 

Y STR analysis of haplogroup ΝΙ-68 chromosomes 
identified 12 haplotypes (V1-68A-V1-68L). In a median- 
joining network (fig. 1A), these haplotypes clustered 
tightly together, with a single inferred node. The two 
high-frequency haplotypes, ΝΙ-68Α and VI-68B, are cen- 
trally located in the network, with the remaining hap- 
lotypes radiating from them. Thehigh frequency of these 
two haplotypes is reflected in the low diversity within 
this haplogroup (h = 0.47; k = 0.56). 

The distribution of VI-68 haplotypes in the Roma was 
compared with that of non-Romani haplogroup V1-68 
chromosomes from different Asian populations. The 22 
non-Romani chromosomes presented with 22 different 
Y STR haplotypes (table 3), including a haplotype that 
was one mutational step away from the most common 
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Figure 1 M edian-joining networks of Y STR haplotypes within four haplogroups. A, Haplogroup V1-68 (N 
0.56). B, Haplogroup ΝΙ-56(Ν = 32; h = 0.87; k = 0.64). C, Haplogroup ΝΙ-52 (N 
(N 17; h = 0.94; k 


113; h = 0.47; k 
3.15). D, Haplogroup ΙΧ-104 


57; h = 0.76; k 


2.50). The sizes of the nodes are proportional to the relative frequency of that haplotype within the haplogroup. 


Branch lengths within each network are proportional to the number of mutations separating haplotypes. 


Romani VI-68A lineage. A median-joining network, 
constructed from all 34 haplogroup ΝΙ-68 haplotypes 
(12 Romani and 22 Asian non-Romani) displayed a 
complex topology, in which the Romani Y chromosomes 
represented a limited subset of closely related haplotypes 
within the overall diversity of haplogroup V1-68 (data 
not shown). The non-Romani haplotypes were widely 
dispersed across the network, with many inferred nodes. 

A single male lineage, VI-68A, defined by the 2-bp 
deletion at M 82 and by Y STR haplotype 15-12-16-14- 
22-10-11-12, was shared by 80 individuals from all Ro- 
mani populations. This common lineage accounted for 
71% of haplogroup ΝΙ-68 chromosomes and for 32% 
of all Romani Y chromosomes examined. It was sepa- 
rated by one mutational step (at marker DY S19) from 
the second most common ΝΙ-68 lineage (V1-68B). VI- 
68B was not as widespread as VI-68A and occurred 
mostly in the Lom and the Lithuanian Roma (table 2). 
The remaining haplogroup V1-68 lineages were rare and 
confined to individual Romani populations. When we 
considered the most frequent haplotype within haplo- 
group VI-68 to be the founding lineage, a coalescent 


date of 992 years ago (95% CI 425-3,472 years) was 
estimated. 


Additional Y-Chromosome Lineages 


Haplogroup VI-56 accounted for 12.7% (32 chro- 
mosomes) of all Romani males (table 2). It was identified 
in 6 of the 14 Romani populations and occurred at high 
frequency in the Lithuanian (25% ) and Spanish (33% ) 
Roma. This haplogroup has been found in Pakistan, cen- 
tral Asia, and the M iddle East (Underhill et al. 2000). 
Within Europe, haplogroup V1-56 has been identified in 
a single male individual from Sardinia (Underhill et al. 
2000). In the Roma, the 32 haplogroup V1-56 chro- 
mosomes fell into nine Y STR haplotypes, V1I-56A-VI- 
561 (table 2). The pattern of the median-joining network 
for these haplotypes (fig. 1B) was similar to that de 
scribed for haplogroup V1-68, with tight clustering of 
haplotypes and no inferred nodes. H aplogroup-diversity 
indices were h = 0.87 and k = 0.64. By comparison, 
18 non-Romani haplogroup VI-56 chromosomes dis- 
played 11 Y STR haplotypes (table 3), of which one was 
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Table 3 

Y STR Haplotypes O bserved in Non-Romani Y-Chromosome Haplogroups VI-68 and VI-56 

H aplogroup Frequency DYS19 DYS388 DYS389AB DYS389CD DYS390 DYS391 DYS392 DYS393 

V1-68 (N = 22) 1 14 12 14 11 23 10 11 11 
1 14 12 15 13 23 10 11 11 
1 14 12 16 13 22 10 11 11 
1 15 12 14 13 22 10 11 11 
1 15 12 14 13 21 10 11 11 
1 15 12 15 13 23 10 11 12 
4 15 12 15 13 23 10 11 11 
1 15 12 15 13 21 10 11 12 
1 15 12 16 14 22 11 11 11 
1 15 12 16 14 22 10 11 11 
1 15 12 17 12 24 10 11 11 
1 15 12 17 13 21 10 11 12 
1 15 12 17 14 23 10 10 11 
1 15 13 15 13 22 10 11 11 
1 15 13 16 13 21 10 11 11 
1 15 13 17 13 22 10 11 11 
1 16 12 14 14 22 10 11 11 
1 16 13 16 14 22 10 11 11 
1 17 12 14 13 22 10 11 12 

ΝΙ-56(Ν - 18) 3 14 14 15 13 22 10 11 11 
1 14 14 15 14 22 10 11 11 
2 14 15 15 13 22 9 11 11 
5 14 15 15 13 22 10 11 11 
1 14 15 15 13 22 9 11 11 
1 14 15 16 13 22 10 11 13 
1 14 15 17 14 23 10 11 11 
1 14 15 17 13 22 10 11 11 
1 15 15 17 14 21 10 11 11 
1 15 15 17 13 24 10 11 12 
1 15 16 16 13 23 11 11 11 


a single mutational step away from the Romani VI-56A 
lineage. 

H aplogroups VI-52 and ΙΧ -104, referred to as “Eu7” 
and “Eul8” by Semino et al. (2000), accounted for 
22.6% and 6.7%, respectively, of all Romani Y chro- 
mosomes. These two haplogroups are common in Eu- 
rope (Underhill et al. 2000), where reverse clinal distri- 
butions have been reported (Semino et al. 2000), with 
higher frequencies of V1-52 in eastern Europe and of IX- 
104 in the western part of the continent. 

H aplogroup V1I-52 was identified in 57 males from 11 
of the 14 Romani populations (table 2). The majority 
(52 of 57) were Roma resident in Bulgaria. Y STR anal- 
ysis identified 15 haplotypes within this haplogroup. 
Two common haplotypes (VI-52A and VI-52B), con- 
tributed primarily by Romani groups that were early 
settlers in Bulgaria, accounted for 61% of the chro- 
mosomes of this haplogroup and for nearly 14% of all 
Romani Y chromosomes. Haplogroup ΝΙ-52 diversity 
indices wereh = 0.76 and k = 3.15. The median-join- 
ing network (fig. 1C) contained inferred nodes, with 
many haplotypes differing from each other by multiple 
mutational steps. 


H aplogroup IX -104 was found in 8 of the 14 Romani 
populations, with 8 of 17 chromosomes coming from 
the Lithuanian and Spanish Roma (table 2). Y STR anal- 
ysis revealed 11 different haplotypes that connect to each 
other in a median-joining network with a number of 
inferred nodes (fig. 1D). The diversity indices in hap- 
logroup ΙΧ-104 were h = 0.94 and k = 2.50. 

The remaining five characterized haplogroups (table 
2) were rare, each accounting for <4% of the total sam- 
ple. Haplogroups VI-57, V-52, and IX-108 have been 
found in different parts of Asia, and III-36 has been 
identified in Ethiopia and South Africa (Underhill et al. 
2000, 2001). Haplogroup VI-71 has no specific geo- 
graphic association and is widely distributed throughout 
the world (Underhill et al. 2000). 


mtDNA Diversity 


The results of the mtDNA analysis of 275 Roma are 
shown in table 4. A total of 12 mtDNA haplogroups 
were identified, of which 2—haplogroups M and 
H — accounted for 62% of the overall study population. 
Analysis of HVS1 revealed 72 unique sequences. Four 


Table 4 
mtDNA Lineages Identified in Roma 


HAPLOGROUP AND ΗΝ 51 VarRIANT(s)? 


No. oF MTDNA LINEAGES IN POPULATION 


LN 


LS 


Lo 


KS 


Ka 


KN 


KW 


KC 


Fe 


Tu 


LR SR Total 


M 


12 


+b 


129, 223, 291, 298 

129, 223, 291 

9, 223, 230, 233, 304 

129, 223, 230, 233, 304, 344 
129, 223, 230, 233, 304, 344, 355 
129, 148, 223, 291, 298 
129, 223, 291, 298, 311 
129, 223, 256, 291 

223, 291, 298 

129, 223, 234, 291, 298 
129, 223, 291, 298, 362 
129, 223, 266, 291 

223, 290, 318T 

223, 304 


261, 304 
186, 304 
218, 278 
354 
Cambridge reference sequence 
192A, 320 
189 

168 

223 

93 

67 

51, 145, 304 
304 

278, 293, 311 
187, 189 
189, 311 
93, 291 

174 

261 

242 

260 

362 

93, 223 


U3: 


Je 


343 
343, 260 


69, 126 

69, 126, 145, 222, 261, 311 
69, 126, 145, 222, 235, 261, 271 
69, 126, 145, 222, 235, 261 
69, 126, 261 

69, 93, 126, 

39C, 69, 126 

69, 126, 193 

69, 126, 278, 366 

69, 126, 300 

69, 126, 311 
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No. oF MTDNA LINEAGES IN POPULATION 


HAPLOGROUP AND HVS1 Variant(s)? LN LS In Μο Lo KS Ka KN KW KC Fe Tu LR SR Total 
xe 

126, 189A, 223, 278 1 3 2 2 1 1 2 12 

93, 189, 223, 241, 278 2 1 2 5 

92, 126, 189A, 223, 278 2 2 

93, 96T, 189, 223, 241, 278 1 1 

92, 189A, 223, 278 1 1 
1:9 

129, 172, 223, 311 3 1 HL 5 
N:1b* 

86, 129, 145, 176G, 223 1 3 1 5 
1; 

126, 294, 296 1 1 1 3 

126, 294, 324 1 1 2 

126, 294, 352 1 1 
U5?! 

28G, 192, 224, 261, 270 1 1 

192, 224, 261, 270 1 1 

189, 270, 311, 336 1 1 

189, 270 1 1 

167, 192, 270, 311, 356 1 1 

256, 270 1 1 
U(K):* 

224, 261, 311 1 1 

222, 224, 261, 311 1 1 

224, 311 1 1 

224, 311, 344 1 1 
υ1: 

1836, 189, 249 1 1 
Αι. 

172, 223, 231, 292 2 1 3 

Total 18 9 16 42 43 10 23 20 5 3 18 25 18 25 275 


* Numbers are those given by Anderson et al. (1981), plus 16,000 
with a letter, 

Ρ Accounts for 26.5% of all mtDNA lineages in this study. 
- Accounts for 35.6% of all mtDNA lineages in this study. 
4 Accounts for 10.2% of all mtDNA lineages in this study. 
e Accounts for 9.1% of all mtDNA lineages in this study. 
Accounts for 7.6% of all mtDNA lineages in this study. 
3 Accounts for 1.8% of all mtDNA lineages in this study. 
^ Accounts for 1.8% of all mtDNA lineages in this study. 

' Accounts for 2.2% of all mtDNA lineages in this study. 

i Accounts for 2.2% of all mtDNA lineages in this study. 

x Accounts for 1.4% of all mtDNA lineages in this study. 

' Accounts for 0.4% of all mtDNA lineages in this study. 
™ Accounts for 1.1% of all mtDNA lineages in this study. 


common lineages—two of haplogroup H and one each 
of haplogroups M and U3—accounted for 36% of all 
Romani individuals. 


Diversity of Maternal Lineages 
Haplogroup M was identified in all 14 Romani pop- 


ulations and accounted for 73 individuals, or 26.5% of 
the total sample (table 4). Haplogroup M is rare in Eu- 


. All variants are transitions from the reference sequence, unless indicated 


rope (Richards et al. 1998; Simoni et al. 2000) but is 
common in Asia and eastern Africa (Quintana-M urci et 
al. 1999). ΗΝ 51 sequence analysis did not identify the 
motif characterizing the African subhaplogroup M 1, de- 
fined by variants at positions 16129, 16189, 16223, 
16249, and 16311 (Quintana-M urci et al. 1999), 
thereby pointing to the Asian origin of these Romani 
lineages. 

ΗΝ 51 analysis of haplogroup M samples revealed 14 
sequences. The two most common haplogroup M line- 
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ages differed by a single mutation step, at position 16298 
(table 4). These two lineages were present in 13 of the 
14 Romani populations and accounted for 14.9% of all 
samples. 

A transition at position 16129, which defines sub- 
haplogroup M 5 (Bamshad et al. 2001), was present in 
11 of the 14 HVS1 sequences of Romani haplogroup 
M . One of the three lineages that do not bear the 16129 
variant— namely, the lineage defined by variants at po- 
sitions 16223, 16291, and 16298—are closely related to 
haplogroup M 5 lineages and may represent a back mu- 
tation at position 16129, a known mutational hotspot 
(Stoneking 2000). Subhaplogroup Μ 5 was thus found 
to account for 97.3% of haplogroup M. A modified 
median-joining network (fig. 2) was used to compare 
haplogroup M lineages in the Roma to those observed 
in India (Kivisild et al. 1999; Quintana-M urci et al. 
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1999). All but two Romani lineages clustered together 
as a small subset of the overall diversity present within 
the Indian haplogroup M. The coalescence of haplo- 
group M lineages in the Roma was estimated to be 4,625 
years ago (950661 2,000-7,250 years). This date was 
obtained by considering that an average of 0.6896 mu- 
tations have accumulated from the putative ancestral 
haplotype—that is, the haplotype with variants at po- 
sitions 16129, 16223, 16291, and 16298. 

Haplogroup H was the most frequent mtDNA hap- 
logroup among the Roma (table 4). It was detected in 
13 of 14 Romani populations and represented 35.6% 
(98 individuals) of the total sample. Haplogroup H is 
most common in Europe (Simoni et al. 2000) and the 
Near East (Richards et al. 2000) but is also found in 
India (Kivisild et al. 1999). HVS1 analysis of haplogroup 
H identified 23 sequences, 2 of which (defined by var- 
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Figure 2 


M odified median-joining network of mtDNA haplogroup M , constructed from data presented in studies by Quintana-M urci et 


al. (1999) and Kivisild et al. (1999) and in the present study. All numbers are those given by Anderson et al. (1981), plus 16,000. Sequences 
identified in the Roma are shown in red; sequences reported for Indian samples are shown in blue. Subhaplogroup designations are as proposed 
by Bamshad et al. (2001), plus additional subclades defined by frequent variants at positions 16189, 16318, and 16093. Branchesareproportional 
to the number of mutations separating sequence types, except those that connect subhaplogroups. 
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Frequency distributions of the common (overall frequency >5% ) male (A) and female (B) haplogroups in Romani populations. 


Populations in which sample size was <15 for either Y-chromosome or mtDNA haplogroup data were excluded from the analysis. 


iants at positions 16261 and 16304 and at positions 
16218 and 16278, respectively) each accounted for 
~22% of haplogroup H and together comprised 20% 
of the overall sample. These two lineages have not been 
found in a large survey of N ear Eastern and European 
individuals (Richards et al. 2000). 

H aplogroup U3 was identified in 28 subjects (10.2% 
of the entire sample), most of whom (23 of 28) were 
Spanish and Lithuanian Roma. Only two lineages were 
identified by HVS1 sequencing, with one of them ac- 
counting for 93% of all U3 samples (table 4). H aplo- 
group U3 is distributed throughout the M iddle East and 
Europe (Richards et al. 2000). 

Haplogroup X occurred in 7.6% of Romani samples 
and could be subdivided into five lineages by HVS1 se 
quencing. Three of these lineages, bearing a transversion 
at position 16189, have not been seen in Europe and 
the M iddle East, where haplogroup X is widely distrib- 
uted (Kivisild et al. 1999; Richards et al. 2000). 


The remaining haplogroups—J, 1, N 1b, T, U5, U(K), 
U1, and W—accounted for 20% of Romani samples. 
Varying numbers of Romani lineages were identified by 
HVS1 sequencing in each haplogroup. These haplo- 
groups have been observed in Europe, the M iddle East, 
and India (Kivisild et al. 1999; Richards et al. 2000; 
Simoni et al. 2000). 


Genetic Structure 


As shown in tables 2 and 4, a total of 13 paternal 
and 25 maternal lineages were found to occur in more 
than one Romani group. The male ΝΙ-68Α lineage was 
shared by Roma from all populations, and two pairs of 
closely related mtDNA lineages, of haplogroups M and 
H, were common to 13 and 8 Romani populations re- 
spectively. At the same time, the frequency distribution 
of both major and rare male and female lineages differed 
dramatically between Romani populations (fig. 3). 
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PC analysis was based on Y-chromosomeand mtDNA 
haplogroup frequencies in Romani populations. The re 
sultant PC plots provided better resolution of the genetic 
structure than was provided by a neighbor-joining tree 
(Nei 1987) using Y STR haplotypes (not shown). The 
PC plots are presented in figure 4. 

Two clusters, consistently present in both Y-chro- 
mosomeand mtDNA analysis, were formed by theM on- 
teni, Intreni, Lingurari, Kalderash, and Lom on onehand 
and by the Feredjelli and Turgovtzi on the other. The 
Spanish and Lithuanian Roma clustered together in the 
mtDNA analysis, and the Kalaidjii North and South 
clustered together in the Y-chromosome comparisons. 

To examine the relevance of different cultural, his- 
torical, and geographic classification criteria to the ge 
netic structure of the Roma, we used AM OVA based on 


A 
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Y STR data and mtDNA H VS1 sequences (table 5). The 
country-of-residence, in which all Roma from Bulgaria 
were compared versus those from Lithuania versus those 
from Spain, showed no significant intergroup differ- 
ences. The same result was obtained with comparisons 
based on place of residence, in which three pairs of Ro- 
mani populations living in close proximity in three small 
towns in Bulgaria were examined. In the analysis based 
on ethnonym reflecting traditional trade, the comparison 
of bowl makers, tinsmiths, traders, and livestock dealers 
showed no significant intergroup differences. 
Intergroup differences accounted for a significant pro- 
portion of the variance only when language and the his- 
tory of migrations were used for classification of Romani 
populations. In the language-based classification, the 
comparisons included speakers of (a) Balkan dialects of 


Figure 4 


Two-dimensional PC plots based on Y STR haplotype frequencies (A) and mtDNA haplogroup frequencies (B). The population 


affinities shown are based on 51% and 42.6%, respectively, of the variation that, on the basis of Y-chromosome and mtDNA data, is present 


within the entire sample. 
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VARIATION (Ρ3) 


Among Groups 


Among Populations within Groups 


Within Populations 


GROUPING CRITERION Y STR mtDNA Y STR mtDNA Y STR mtDNA 

Total sample 13.0% (<.00001) 6.2% (<.00001) 87.0% (<.00001) 93.8% (<.00001) 
Country of residence? 5.1% (.79277) 4.0% (.01760) 15.2% (<.00001) 4.8% (<.00001) 89.9% (<.00001) 91.2% (<.00001) 
Town of residence 6.7% (.21408) 5% (.32551) 7.5% (.00391) .8% (.26686) 85.8% (<.00001) 98.7% (.16618) 
Trade/group (ethnonym)? 7.9% (.08113) 4.7% (.01622) 8.5% (<.00001) 2.1% (.05083) 83.6% (<.00001) 93.2% (<.00001) 
Religion® 6.2% (.03617) 4.3% (.00196) 8.0% (<.00001) 2.9% (<.00001) 85.8% (<.00001) 92.8% (<.00001) 
Language 6.5% (07234) 6.390 (<.00001) 7.2% (<.00001) 0.7% (<.00001) 86.3% (<.00001) 92.9% (<.00001) 
Historical migration? 10.5% (<.00001) 5.0% (<.00001) 5.3% (<.00001) 3.0% (<.00001)A 84.296 (<.00001) 92.0% (<.00001) 


a With Bonferroni correction, P <.0083. 


> For Group 1 populations Tu, Fe, KN, KC, KW, Mo, In, Lo, Ka, LN, LS, and KS; Group 2 population SR; and Group 3 population LR. 
© For Group 1 populations Lo and KN; Group 2 populations Tu and Fe; and Group 3 populations KS and KW. 
* For Group 1 populations Μο, In, LN, and LS; Group 2 populations Tu and SR; Group 3 populations KN and KS; and Group 4 population 


Lo. 


e For Group 1 populations Tu, Fe, KS, and KC; Group 2 populations Μο, In, Ka, LN, and LS; Group 3 populations Lo, SR, KN, and KW; and 


Group 4 population LR. 


f For Group 1 populations Tu, ΚΝ, KC, and KW; Group 2 population Fe; Group 3 populations KS, Lo, and Ka; Group 4 populations Μο, In, 


LN, and LS; Group 5 population LR; and Group 6 population SR. 


3 For Group 1 populations Tu, Fe, KN, KW, and KC; Group 2 populations Lo, Ka, KS, LN, LS, Mo, and In; and Group 3 populations SR and 


LR. 


Romanes, (b) Vlax dialects (Old as well as N ew Vlax), 
(c) Romanian, and (d) the languages of the surrounding 
majority populations. The major difference between 
these two groupings was related to the Lingurari, M on- 
teni, and Intreni; they formed the group of Romanian 
speakers in the language classification, whereas, in the 
classification based on migrational history, they were 
placed together with the speakers of Vlax Romanes di- 
alects. The language division resulted in significant in- 
tergroup differences for the female lineages only. Highly 
significant intergroup differences for both paternal and 
maternal lineages were observed only when classification 
was based on the history of migrations, comparing the 
old settlers in the Balkans to the migrants to Wallachia 
and Moldavia and to those moving to northern and 
western Europe. This comparison showed that ~10% of 
the variance for Y chromosome and 5% for mtDNA 
(P <.00001 for both) was due to differences between 
the migrational groups. 


Discussion 


The Roma do not have their own written history; there- 
fore, theories about their origins and migrations are 
based on legends or on linguistics and cultural anthro- 
pology. Early European historical records refer to the 
Roma as Egyptians, and the term “Gypsy” is thought 
to reflect that assumption (Fraser 1992). Another pop- 
ular legend is derived from an 11th-century chronicle by 
a Persian historian, describing a group of 10,000- 
12,000 musicians and entertainers given as a gift to the 
ruler of Persia, Shah Bahram Gur, by an Indian M a- 
haraja, during the 5th century (Fraser 1992). The theory 


of the Indian origins of the Roma (reviewed in Fraser 
1992) is based on the similarities between Romanes and 
languages of the Indian subcontinent. H owever, the lack 
of close relationship with any specific living language 
or dialect in India has given rise to the concept of Roma- 
nes resulting from the “mixing of linguistic subsystems 
in the context of increased interaction among speakers 
of these varieties” (H ancock 2000, p. 2). This linguistic 
theory has been linked to the historical period of the 
Islamic invasions of India and proposes that the Roma 
derive from the ethnically diverse martial society of the 
Rajputs, as well as from camp followers drawn from the 
lowest Varna and the out-caste or untouchable groups 
(H ancock 2000). The argument of diverse origins rooted 
in India is supported by the social organization of the 
Roma, whose multiple endogamous populations with 
professional ethnonyms bear close resemblance to the 
jatis of India (Fraser 1992; M arushiakova and Popov 
1997). The endogamous professional-group organiza- 
tion could thus have been an inherent social character- 
istic of the proto-Roma at the time of the exodus from 
India. It is also conceivable that the fragmentation into 
small populations has occurred, within Europe, as a 
means of higher mobility— and, thus, survival in the face 
of repressive legislation and persecution (H ancock 1987; 
Fraser 1992; Liégeois 1994)— and has been consolidated 
further by geographic dispersal and cultural and lin- 
guistic diversification. These scenarios could have a dif- 
ferent impact on present genetic structure, with impli- 
cations for genetic research, especially into complex 
disorders. 

This study has demonstrated the sharing of identical 
Asian-specific paternal and maternal lineages between 
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all Romani populations. Nearly 45% of Y chromo- 
somes belong to haplogroup V1-68, and a single lineage 
within that haplogroup, found across Romani popu- 
lations, accounts for almost one-third of Romani males. 
A similar preservation of a highly resolved male lineage 
has been reported elsewhere only for Jewish priests 
(Thomas et al. 1998). Similarly, Asian-specific mtDNA 
haplogroup M is found in 13 of 14 Romani populations 
and accounts for 26.5% of maternal lineages in the 
Roma. The data provide strong evidence of Asian ori- 
gins, in contrast with claims that the R oma area socially 
defined population of European descent (O kely 1983; 
Wexler 1997). 

Analysis of diversity within haplogroups ΝΙ-68 and 
M provides an insight into the genetic composition of 
the ancestral population. The Y-chromosome haplo- 
group ΝΙ-68Υ STR haplotypes are closely related, sug- 
gesting recent diversification by mutational processes, 
and cluster as a subset of the overall diversity of Asian 
haplogroup V1-68. Detailed comparisons between the 
diversity in the Romani V1-68 lineage and that in the 
Asian haplogroup VI-68 will become possible when 
moreinformation about malelineages in the Indian sub- 
continent becomes available. M ost mtDN A haplogroup 
M lineages belong to subhaplogroup M 5 (Bamshad et 
al. 2001) and form a small subset of the diversity within 
Indian haplogroup M. Again, close genealogical rela- 
tionship suggests that diversity has arisen by mutation 
rather than by diverse origins or admixture. The rela- 
tively recent ages determined for haplogroup V I-68 and 
M in this study suggest that the ethnogenesis of the 
Roma can be understood as a profound bottleneck 
event. Although identification of the parental popula- 
tion of the proto-Roma has to await better understand- 
ing of genetic diversity in the Indian subcontinent, our 
results suggest a limited number of related founders, 
compatible with a small group of migrants splitting 
from a distinct caste or tribal group. 

The present findings and the published data on global 
diversity do not allow a distinction between additional 
founding lineages and early admixture for Y-chromo- 
some haplogroup ΝΙ-56 and the less common haplo- 
groups, shown to occur in Asia and the M iddle East 
(Underhill et al. 2000, 2001), and for mtDNA haplo- 
groups H and X, widely distributed from Europe to 
India (Kivisild et al. 1999; Simoni et al. 2000; Richards 
et al. 2000). Both the close relationship between hap- 
lotypes within haplogroup VI-56 and its frequency dis- 
tribution among the Roma point to introduction by a 
small number of related males. The fact that the com- 
mon Romani mtDNA haplogroup H and X lineages 
have not been found among a large number of M iddle 
Eastern and European individuals (Richards et al. 2000) 
suggests that they might be founding lineages of Indian 
origin. Regardless of the history of these lineages, the 
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observed pattern points to greater female diversity in 
the early Romani population, compared with the male 
component. 

Although the sharing of genetic lineages supports the 
common origins of the Roma, differentiation between 
Romani populations is evidenced by the distribution of 
male and female lineages (fig. 3). The results of the 
AMOVA and PC analysis provide an insight into the 
contribution that different factors make to the shaping 
of the genetic structure of Romani populations. The 
irrelevance of geographic criteria for studying the Roma 
has been emphasized repeatedly by cultural anthropol- 
ogists (Petulengro 1915-16; Fraser 1992; Liégeois 
1994; M arushiakova and Popov 1997), yet country of 
residence has been used consistently as the descriptor 
in genetic studies of the Roma (reviewed by Kalaydjieva 
et al. [2001b]). Our present results indicate that geog- 
raphy has no relevance to genetic structure, even when 
Romani populations living in close proximity in the 
same small town are considered. This is in contrast to 
the findings for other European populations, in which 
geographic distance (rather than culture and language) 
has been found to play the major role (Rosser et al. 
2000). The lack of genetic correlation with recently ac- 
quired religions (Muslim or Christian) is hardly sur- 
prising. Interestingly, traditional trade reflected in the 
ethnonym, an important factor in defining self-identity 
of Romani populations, was found to bea poor group- 
ing criterion. By far the most significant differences be 
tween groups of populations were observed when lan- 
guage and especially history of migrations were used as 
the classification criteria in the AM OVA comparisons. 
These two indicators are closely related, since the clas- 
sification of Romanes dialects is based mainly on ex- 
ternal linguistic influences and borrowings. The signif- 
icant difference between language groups, for female 
(but not male) lineages, possibly reflects the strict en- 
dogamy rules practiced by the Romanian-speaking 
Roma toward females from other populations. Strong 
support for the migrational grouping of populations 
was provided also by the results of the PC analysis. 

The European migrations of the Roma have followed 
three major streams. Whereas the majority settled 
within the Balkan provinces of the Ottoman Empire, 
some headed to the autonomous principalities of Wal- 
lachia and M oldavia, north of the Danube (in present- 
day Romania), and others continued the journey north 
and west. Ottoman tax registries suggest that the num- 
ber of Roma initially settling in the Empire would have 
been small (M arushiakova and Popov 1997), and early 
historical records from Western Europe invariably de 
scribe Gypsies arriving as a group of 50-300 individuals 
led by an elder (Colocci 1889). The early-settled Ro- 
mani population south of the Danube and the super- 
imposed migrations, from Wallachia and M oldavia, of 
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small groups of runaway slaves during the 17th and 
18th centuries and of larger numbers after the abolition 
of Gypsy slavery during the 19th century (M arushia- 
kova and Popov 2001b) have spawned >50 socially di- 
verse Romani populations in Bulgaria alone (M aru- 
shiakova and Popov 1997). Our data indicate that 
current genetic structure results mainly from the early 
splits and divergent routes within Europe. Two pro- 
cesses, genetic drift and different levels and sources of 
admixture, appear to have played a role in the subse 
quent differentiation of populations. The effects of dif- 
ferential admixture are illustrated by the distribution of 
Y-chromosome haplogroups V1-52 and |X -104, whose 
occurrence among the Roma reflects the reported clinal 
distribution in Europe (Semino et al. 2000). Intrahap- 
logroup diversities in the Roma are consistent with mul- 
tiple independent admixture events. Similar examples 
are provided by mtDNA haplogroups H (excluding the 
two common lineages), X, T, and U5. The effects of 
drift are likely to account for the different frequencies 
of the major common lineages in the diverse Romani 
populations (fig. 3), such as the uneven representation 
of Y-chromosome haplogroup VI-56 and mtDNA hap- 
logroup U3, both of which occur in multiple Romani 
populations. 

Application of the knowledge of the origins and di- 
versification of the Roma should prove useful in the 
design of future medical genetic studies. O ur results are 
in need of further confirmation through the study of 
larger sample sizes, with wider representation of west- 
ern-European Roma and of populations speaking the 
two major varieties of Balkan dialects of Romanes. One 
should also note that current genetic data may not mir- 
ror accurately the original composition of the migrant 
proto-Romani population; the profound effect of ge 
netic drift due to small population size would have been 
complemented by the history of violent persecution of 
the Roma in Europe, culminating in the death camps 
of the Second World War (Fings et al. 1997). N onethe 
less, the findings point to an interesting difference in the 
biological and cultural history of the Roma. Whereas 
genetic differentiation appears to carry the imprint of 
the early European history of the Roma, social diver- 
sification seems to be the product of a recent restitution 
of the traditions of the ancient country of origin. 
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